Goto

Collaborating Authors

 whole-body controller


Humanoid Whole-Body Badminton via Multi-Stage Reinforcement Learning

Liu, Chenhao, Jiang, Leyun, Wang, Yibo, Yao, Kairan, Fu, Jinchen, Ren, Xiaoyu

arXiv.org Artificial Intelligence

A fully autonomous humanoid returns machine-fed shuttles in a motion-capture arena; overlaid arcs show an incoming (blue) and returned (orange) trajectory. Abstract--Humanoid robots have demonstrated strong capabilities for interacting with static scenes across locomotion, manipulation, and more challenging loco-manipulation tasks. Y et the real world is dynamic, and quasi-static interactions are insufficient to cope with diverse environmental conditions. As a step toward more dynamic interaction scenarios, we present a reinforcement-learning-based training pipeline that produces a unified whole-body controller for humanoid badminton, enabling coordinated lower-body footwork and upper-body striking without motion priors or expert demonstrations. Training follows a three-stage curriculum--first footwork acquisition, then precision-guided racket swing generation, and finally task-focused refinement--yielding motions in which both legs and arms serve the hitting objective. For deployment, we incorporate an Extended Kalman Filter (EKF) to estimate and predict shuttlecock trajectories for target striking. We also introduce a prediction-free variant that dispenses with EKF and explicit trajectory prediction. T o validate the framework, we conduct five sets of experiments in both simulation and the real world. In simulation, two robots sustain a rally of 21 consecutive hits. Moreover, the prediction-free variant achieves successful hits with comparable performance relative to the target-known policy. In real-world tests, both prediction and controller modules exhibit high accuracy, and on-court hitting achieves an outgoing shuttle speed up to 19.1 m/s with a mean return landing distance of 4 m. These experimental results show that our proposed training scheme can deliver highly dynamic while precise goal striking in badminton, and can be adapted to more dynamics-critical domains. Humanoid platforms have been proposed as general-purpose embodied agents for human-compatible skills [1, 2, 3, 4, 5, 6, 7]. Despite rapid progress in locomotion and motion imitation, agile, contact-rich interactions with fast-moving objects under tight reaction windows remain underexplored.


Opening the Sim-to-Real Door for Humanoid Pixel-to-Action Policy Transfer

Xue, Haoru, He, Tairan, Wang, Zi, Ben, Qingwei, Xiao, Wenli, Luo, Zhengyi, Da, Xingye, Castañeda, Fernando, Shi, Guanya, Sastry, Shankar, Fan, Linxi "Jim", Zhu, Yuke

arXiv.org Artificial Intelligence

Recent progress in GPU-accelerated, photorealistic simulation has opened a scalable data-generation path for robot learning, where massive physics and visual randomization allow policies to generalize beyond curated environments. Building on these advances, we develop a teacher-student-bootstrap learning framework for vision-based humanoid loco-manipulation, using articulated-object interaction as a representative high-difficulty benchmark. Our approach introduces a staged-reset exploration strategy that stabilizes long-horizon privileged-policy training, and a GRPO-based fine-tuning procedure that mitigates partial observability and improves closed-loop consistency in sim-to-real RL. Trained entirely on simulation data, the resulting policy achieves robust zero-shot performance across diverse door types and outperforms human teleoperators by up to 31.7% in task completion time under the same whole-body control stack. This represents the first humanoid sim-to-real policy capable of diverse articulated loco-manipulation using pure RGB perception.


HoMeR: Learning In-the-Wild Mobile Manipulation via Hybrid Imitation and Whole-Body Control

Sundaresan, Priya, Malhotra, Rhea, Miao, Phillip, Yang, Jingyun, Wu, Jimmy, Hu, Hengyuan, Antonova, Rika, Engelmann, Francis, Sadigh, Dorsa, Bohg, Jeannette

arXiv.org Artificial Intelligence

We introduce HoMeR, an imitation learning framework for mobile manipulation that combines whole-body control with hybrid action modes that handle both long-range and fine-grained motion, enabling effective performance on realistic in-the-wild tasks. At its core is a fast, kinematics-based whole-body controller that maps desired end-effector poses to coordinated motion across the mobile base and arm. Within this reduced end-effector action space, HoMeR learns to switch between absolute pose predictions for long-range movement and relative pose predictions for fine-grained manipulation, offloading low-level coordination to the controller and focusing learning on task-level decisions. We deploy HoMeR on a holonomic mobile manipulator with a 7-DoF arm in a real home. We compare HoMeR to baselines without hybrid actions or whole-body control across 3 simulated and 3 real household tasks such as opening cabinets, sweeping trash, and rearranging pillows. Across tasks, HoMeR achieves an overall success rate of 79.17% using just 20 demonstrations per task, outperforming the next best baseline by 29.17 on average. HoMeR is also compatible with vision-language models and can leverage their internet-scale priors to better generalize to novel object appearances, layouts, and cluttered scenes. In summary, HoMeR moves beyond tabletop settings and demonstrates a scalable path toward sample-efficient, generalizable manipulation in everyday indoor spaces. Code, videos, and supplementary material are available at: http://homer-manip.github.io


DemoHLM: From One Demonstration to Generalizable Humanoid Loco-Manipulation

Fu, Yuhui, Xie, Feiyang, Xu, Chaoyi, Xiong, Jing, Yuan, Haoqi, Lu, Zongqing

arXiv.org Artificial Intelligence

Loco-manipulation is a fundamental challenge for humanoid robots to achieve versatile interactions in human environments. Although recent studies have made significant progress in humanoid whole-body control, loco-manipulation remains underexplored and often relies on hard-coded task definitions or costly real-world data collection, which limits autonomy and generalization. We present DemoHLM, a framework for humanoid loco-manipulation that enables generalizable loco-manipulation on a real humanoid robot from a single demonstration in simulation. DemoHLM adopts a hierarchy that integrates a low-level universal whole-body controller with high-level manipulation policies for multiple tasks. The whole-body controller maps whole-body motion commands to joint torques and provides omnidirectional mobility for the humanoid robot. The manipulation policies, learned in simulation via our data generation and imitation learning pipeline, command the whole-body controller with closed-loop visual feedback to execute challenging loco-manipulation tasks. Experiments show a positive correlation between the amount of synthetic data and policy performance, underscoring the effectiveness of our data generation pipeline and the data efficiency of our approach. Real-world experiments on a Unitree G1 robot equipped with an RGB-D camera validate the sim-to-real transferability of DemoHLM, demonstrating robust performance under spatial variations across ten loco-manipulation tasks.


RoMoCo: Robotic Motion Control Toolbox for Reduced-Order Model-Based Locomotion on Bipedal and Humanoid Robots

Dai, Min, Ames, Aaron D.

arXiv.org Artificial Intelligence

By leveraging reduced-order models for platform-agnostic gait generation, RoMoCo enables flexible controller design across diverse robots. We demonstrate its versatility and performance through extensive simulations on the Cassie, Unitree H1, and G1 robots, and validate its real-world efficacy with hardware experiments on the Cassie and G1 humanoids.


GMT: General Motion Tracking for Humanoid Whole-Body Control

Chen, Zixuan, Ji, Mazeyu, Cheng, Xuxin, Peng, Xuanbin, Peng, Xue Bin, Wang, Xiaolong

arXiv.org Artificial Intelligence

The ability to track general whole-body motions in the real world is a useful way to build general-purpose humanoid robots. However, achieving this can be challenging due to the temporal and kinematic diversity of the motions, the policy's capability, and the difficulty of coordination of the upper and lower bodies. To address these issues, we propose GMT, a general and scalable motion-tracking framework that trains a single unified policy to enable humanoid robots to track diverse motions in the real world. GMT is built upon two core components: an Adaptive Sampling strategy and a Motion Mixture-of-Experts (MoE) architecture. The Adaptive Sampling automatically balances easy and difficult motions during training. The MoE ensures better specialization of different regions of the motion manifold. We show through extensive experiments in both simulation and the real world the effectiveness of GMT, achieving state-of-the-art performance across a broad spectrum of motions using a unified general policy. Videos and additional information can be found at https://gmt-humanoid.github.io.


HITTER: A HumanoId Table TEnnis Robot via Hierarchical Planning and Learning

Su, Zhi, Zhang, Bike, Rahmanian, Nima, Gao, Yuman, Liao, Qiayuan, Regan, Caitlin, Sreenath, Koushil, Sastry, S. Shankar

arXiv.org Artificial Intelligence

Our system enables both humanoid-humanoid (left) and humanoid-human (right) matches, achieving rallies of up to 106 consecutive shots against a human opponent. Abstract -- Humanoid robots have recently achieved impressive progress in locomotion and whole-body control, yet they remain constrained in tasks that demand rapid interaction with dynamic environments through manipulation. T able tennis exemplifies such a challenge: with ball speeds exceeding 5 m/s, players must perceive, predict, and act within sub-second reaction times, requiring both agility and precision. T o address this, we present a hierarchical framework for humanoid table tennis that integrates a model-based planner for ball trajectory prediction and racket target planning with a reinforcement learning-based whole-body controller . The planner determines striking position, velocity and timing, while the controller generates coordinated arm and leg motions that mimic human strikes and maintain stability and agility across consecutive rallies. Moreover, to encourage natural movements, human motion references are incorporated during training. We validate our system on a general-purpose humanoid robot, achieving up to 106 consecutive shots with a human opponent and sustained exchanges against another humanoid.


Humanoid Robot Acrobatics Utilizing Complete Articulated Rigid Body Dynamics

Brantner, Gerald

arXiv.org Artificial Intelligence

Endowing humanoid robots with the ability to perform highly dynamic motions akin to human-level acrobatics has been a long-standing challenge. Successfully performing these maneuvers requires close consideration of the underlying physics in both trajectory optimization for planning and control during execution. This is particularly challenging due to humanoids' high degree-of-freedom count and associated exponentially scaling complexities, which makes planning on the explicit equations of motion intractable. Typical workarounds include linearization methods and model approximations. However, neither are sufficient because they produce degraded performance on the true robotic system. This paper presents a control architecture comprising trajectory optimization and whole-body control, intermediated by a matching model abstraction, that enables the execution of acrobatic maneuvers, including constraint and posture behaviors, conditioned on the unabbreviated equations of motion of the articulated rigid body model. A review of underlying modeling and control methods is given, followed by implementation details including model abstraction, trajectory optimization and whole-body controller. The system's effectiveness is analyzed in simulation.


Hand-Eye Autonomous Delivery: Learning Humanoid Navigation, Locomotion and Reaching

Chen, Sirui, Ye, Yufei, Cao, Zi-Ang, Lew, Jennifer, Xu, Pei, Liu, C. Karen

arXiv.org Artificial Intelligence

We propose Hand-Eye Autonomous Delivery (HEAD), a framework that learns navigation, locomotion, and reaching skills for humanoids, directly from human motion and vision perception data. We take a modular approach where the high-level planner commands the target position and orientation of the hands and eyes of the humanoid, delivered by the low-level policy that controls the whole-body movements. Specifically, the low-level whole-body controller learns to track the three points (eyes, left hand, and right hand) from existing large-scale human motion capture data while high-level policy learns from human data collected by Aria glasses. Our modular approach decouples the ego-centric vision perception from physical actions, promoting efficient learning and scalability to novel scenes. We evaluate our method both in simulation and in the real-world, demonstrating humanoid's capabilities to navigate and reach in complex environments designed for humans.


Mobile Manipulation with Active Inference for Long-Horizon Rearrangement Tasks

Pezzato, Corrado, Çatal, Ozan, Van de Maele, Toon, Pitliya, Riddhi J., Verbelen, Tim

arXiv.org Artificial Intelligence

Despite growing interest in active inference for robotic control, its application to complex, long-horizon tasks remains untested. We address this gap by introducing a fully hierarchical active inference architecture for goal-directed behavior in realistic robotic settings. Our model combines a high-level active inference model that selects among discrete skills realized via a whole-body active inference controller. This unified approach enables flexible skill composition, online adaptability, and recovery from task failures without requiring offline training. Evaluated on the Habitat Benchmark for mobile manipulation, our method outperforms state-of-the-art baselines across the three long-horizon tasks, demonstrating for the first time that active inference can scale to the complexity of modern robotics benchmarks.